An interactive tutorial on text-to-speech synthesis from diphones in time domain
نویسندگان
چکیده
We are presenting an interactive course on speech synthesis which is designed to support the education in speech communication. In the basic section, the fundamental principles of speech synthesis are explained. To explore a complete text-to-speech (TTS) system, the user is provided with access to the Dresden Speech Synthesizer DreSS. The user may type any text, and he may observe how the system processes the text from the first linguistic preprocessing until the acoustic synthesis. A further section is devoted to the crucial problem of correct segmentation of the speech elements used for the concatenative synthesis. The user may select his own diphone segments from a given speech data base. The quality of the segments may be evaluated acoustically, and hints are given to avoid errors in cutting. Thus, the user will learn how to select the segments with good quality. The course is written in HTML and Java and is designed for Internet application.
منابع مشابه
Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملHalfphones: A Backoff Mechanism for Diphone Unit Selection Synthesis
Diphone Backoff mechanisms in text-to-speech provide a means of ensuring that synthesis of the text takes place, even if some of the diphones in the text are missing in the speech database. This paper describes an automatic method for synthetically creating missing diphones from halfphones that are in the speech database.
متن کاملDesign and evaluation of prosodically-sensitive concatenative units for a Korean TTS system
This paper describes the design and evaluation of prosodically-sensitive concatenative units for a Korean text-to-speech (TTS) synthesis system. The diphones used are prosodically conditioned in the sense that a single conventional diphone is stored as different versions taken directly from the different prosodic domains of the prosodically labeled, read sentences. The four levels of the Korean...
متن کاملA database design for a TTS synthesis system using lexical diphones
Database designs, if based on the premise that there are about 2000 diphones in English, as stated in many publications and on-line documents, are likely to render a database of diphones, which will fail to capture some important phonological phenomena of English. This paper proposes a TTS database, which is built from diphones inclusive of their syllabic stress; we term these units lexical dip...
متن کاملExtraction of Di-phones for Telugu ::Issues and solutions
This paper describes a method for extraction of diphones to generate diphone database for concatenative text to speech systems. Diphone is an adjacent pair of phones. Diphone is a very important resource for both text to speech [TTS] and speech to text [STT]. Consider the pronunciation of -kaaki. It consists of phonemes [k], అ [a], అ [a], [k], ఇ[i]. The diphones generated while pronouncing the ...
متن کامل